Day 17 - Regular expressions - Groups
57
$ cat examples.txt | grep -oP "[A-Za-z ]+(?=[0-9]+)"
Police
H
R
D
Johnny
Cyborg
The immediate evolution of the positive lookahead is the negative lookahead, which is expressed by
?!. You can start to see a pattern here (apt, speaking of regular expressions). Lookaround groups are
introduced by a ? and followed by a criteria, which can be equality (=) or inequality (!).
You can also use lookbehind expressions, which start with a ?< instead of starting with a simple ?.
These expressions match patterns that follow the lookbehind group, for example
$ cat examples.txt | grep -oP "(?<=[A-Z])[a-z]+"
ug
og
olice
ohnny
pider
an
yborg
ig
ad
olf
ony
ictures
That matches all the lowercase letters that follow an uppercase one, without including the latter.
The lookbehind expression is (?<=[A-Z]).
A warning: lookaround expressions are often difficult to manage, and their behaviour can be
surprising because it strongly depends on the implementation of the engine. You won’t hit such
complex cases now that you just learned how to use groups, but it might happen in the future. For
the time being, please keep in mind that there are important things to learn about regular expression
engines, such as if they are greedy or not. This book wants to be a primer, so I will simply pretend
those issues do not exist, but remember that there is a lot to learn out there!
Back-references are actually supported by grep, but their behaviour can be surprising. The code
$ cat examples.txt | grep -E "[A-Z]([0-9])-[A-Z]\1"
R2-D2